I used a data set that contains Airbnb listings in 10 European cities, with the following attributes available for each listing:
Data source: Kaggle https://www.kaggle.com/datasets/thedevastator/airbnb-prices-in-european-cities
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import statsmodels.api as sm
from scipy.stats import pearsonr
cities_list = ['amsterdam', 'athens', 'barcelona', 'berlin', 'budapest',
'lisbon', 'london', 'paris', 'rome', 'vienna']
weekdays_dict = {}
weekends_dict = {}
for city in cities_list:
df_weekdays = pd.read_csv(f'data/{city}_weekdays.csv')
df_weekends = pd.read_csv(f'data/{city}_weekends.csv')
# city variable
df_weekdays['city'] = city
df_weekends['city'] = city
# dummy variable for weekend
df_weekdays['is_weekend'] = 0
df_weekends['is_weekend'] = 1
df_weekdays['week_part'] = 'weekday'
df_weekends['week_part'] = 'weekend'
weekdays_dict[city] = df_weekdays
weekends_dict[city] = df_weekends
print(city, f'weekdays: {len(df_weekdays)}, weekends: {len(df_weekends)}')
# combine into one dataframe
df = pd.concat(list(weekdays_dict.values())+list(weekends_dict.values())).drop(columns=['Unnamed: 0'])
# convert realSum to price per night
df['price_per_night'] = df['realSum']/2
df.info()
Athens have the best guest satisfaction and cleanliness ratings, and also the cheapest despite having the highest person capacity.
Amsterdam is the most expensive, presumably because it has the highest average number of bedrooms.
London has the best attraction index but worst guest satisfaction and cleanliness ratings.
Listing prices on weekdays and weekends are similar, except in Amsterdam where prices are higher on weekends.
Listing prices for superhosts and non-superhosts do not differ much in most cities.
df_ = df.groupby('city').agg({'price_per_night':'median', 'room_shared':'mean', 'bedrooms':'mean', 'person_capacity':'mean',
'host_is_superhost':'mean', 'multi':'mean', 'biz':'mean',
'guest_satisfaction_overall':'mean', 'cleanliness_rating':'mean',
'dist':'mean', 'metro_dist':'mean',
'attr_index_norm':'mean', 'rest_index_norm':'mean'}).reset_index().sort_values('price_per_night')
df_
| city | price_per_night | room_shared | bedrooms | person_capacity | host_is_superhost | multi | biz | guest_satisfaction_overall | cleanliness_rating | dist | metro_dist | attr_index_norm | rest_index_norm | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | athens | 63.857709 | 0.002083 | 1.271402 | 3.698106 | 0.428598 | 0.267424 | 0.379924 | 95.003598 | 9.638447 | 1.803080 | 0.478656 | 5.740839 | 9.954268 |
| 4 | budapest | 76.491047 | 0.003481 | 1.105669 | 3.540776 | 0.378916 | 0.303332 | 0.348831 | 94.585281 | 9.477374 | 1.872763 | 0.544059 | 12.675248 | 34.529089 |
| 8 | rome | 91.295911 | 0.001329 | 1.229755 | 3.357372 | 0.326687 | 0.385953 | 0.334109 | 93.122300 | 9.514678 | 3.026982 | 0.819794 | 10.426968 | 25.078056 |
| 3 | berlin | 95.587548 | 0.029388 | 1.070451 | 2.774960 | 0.257246 | 0.276167 | 0.174718 | 94.323671 | 9.461755 | 5.257093 | 0.836064 | 16.803111 | 30.666967 |
| 2 | barcelona | 104.149696 | 0.004236 | 1.161313 | 2.616661 | 0.181433 | 0.385104 | 0.325450 | 91.109072 | 9.291564 | 2.116982 | 0.441248 | 16.636220 | 19.376528 |
| 9 | vienna | 104.247014 | 0.004524 | 1.102347 | 3.350297 | 0.284139 | 0.279050 | 0.339836 | 93.731128 | 9.472434 | 3.139488 | 0.526670 | 8.762474 | 4.239580 |
| 5 | lisbon | 112.687617 | 0.012841 | 1.272428 | 3.343398 | 0.213951 | 0.239459 | 0.587541 | 91.093875 | 9.370640 | 1.966893 | 0.711482 | 7.324730 | 28.274084 |
| 6 | london | 130.647475 | 0.005004 | 1.128790 | 2.846192 | 0.157410 | 0.274992 | 0.387872 | 90.645652 | 9.175023 | 5.326421 | 1.005547 | 20.537398 | 11.234105 |
| 7 | paris | 158.798583 | 0.014055 | 0.972787 | 2.953648 | 0.140700 | 0.219498 | 0.245813 | 92.037530 | 9.263606 | 2.995823 | 0.227323 | 18.204358 | 42.589111 |
| 0 | amsterdam | 230.122091 | 0.004808 | 1.292308 | 2.781731 | 0.284135 | 0.283173 | 0.105288 | 94.514423 | 9.465865 | 2.825052 | 1.089367 | 14.246499 | 26.097566 |
df_ = df.groupby('city').agg({'price_per_night':'std', 'room_shared':'std',
'bedrooms':'std', 'person_capacity':'std',
'host_is_superhost':'std', 'multi':'std', 'biz':'std',
'guest_satisfaction_overall':'std', 'cleanliness_rating':'std',
'dist':'std', 'metro_dist':'std',
'attr_index_norm':'std', 'rest_index_norm':'std'}).reset_index().sort_values('price_per_night')
df_
| city | price_per_night | room_shared | bedrooms | person_capacity | host_is_superhost | multi | biz | guest_satisfaction_overall | cleanliness_rating | dist | metro_dist | attr_index_norm | rest_index_norm | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 5 | lisbon | 54.486540 | 0.112596 | 0.728539 | 1.344214 | 0.410128 | 0.426790 | 0.492320 | 9.148114 | 0.924080 | 1.742681 | 0.920204 | 5.082390 | 17.877309 |
| 8 | rome | 59.309052 | 0.036438 | 0.549710 | 1.309052 | 0.469028 | 0.486847 | 0.471704 | 7.815107 | 0.808415 | 1.644095 | 0.631361 | 6.631054 | 13.414188 |
| 4 | budapest | 65.572403 | 0.058903 | 0.663484 | 1.256548 | 0.485177 | 0.459754 | 0.476660 | 6.525680 | 0.842693 | 1.874925 | 0.856410 | 6.672535 | 19.111435 |
| 3 | berlin | 117.664645 | 0.168926 | 0.552033 | 1.188142 | 0.437204 | 0.447191 | 0.379802 | 6.809406 | 0.849384 | 3.692649 | 1.267283 | 10.774273 | 16.634505 |
| 1 | athens | 132.940027 | 0.045600 | 0.652575 | 1.284703 | 0.494922 | 0.442657 | 0.485414 | 8.348637 | 0.839767 | 0.953738 | 0.284154 | 4.667181 | 10.778060 |
| 7 | paris | 165.474872 | 0.117727 | 0.642571 | 1.215007 | 0.347738 | 0.413937 | 0.430601 | 8.818201 | 0.974036 | 1.463542 | 0.122769 | 7.759372 | 15.680438 |
| 2 | barcelona | 177.733944 | 0.064956 | 0.517108 | 1.153124 | 0.385445 | 0.486706 | 0.468625 | 8.607153 | 1.014577 | 1.377859 | 0.284540 | 9.591798 | 10.256285 |
| 9 | vienna | 198.873582 | 0.067115 | 0.602819 | 1.282163 | 0.451067 | 0.448596 | 0.473720 | 7.220808 | 0.855439 | 1.942337 | 0.516132 | 6.259531 | 3.683256 |
| 0 | amsterdam | 215.329203 | 0.069187 | 0.736683 | 1.032634 | 0.451110 | 0.450648 | 0.306999 | 6.350874 | 0.813421 | 2.082573 | 0.831669 | 10.335158 | 17.720931 |
| 6 | london | 235.678632 | 0.070562 | 0.579477 | 1.246235 | 0.364205 | 0.446533 | 0.487289 | 11.510622 | 1.166180 | 2.712573 | 1.263926 | 11.914575 | 6.963803 |
metric_list = ['price_per_night','room_shared','bedrooms','person_capacity',
'host_is_superhost','multi', 'biz', 'guest_satisfaction_overall','cleanliness_rating',
'dist', 'metro_dist', 'attr_index_norm', 'rest_index_norm']
highest = []
lowest = []
for metric in metric_list:
highest.append(df_.sort_values(metric)['city'].values[-1])
lowest.append(df_.sort_values(metric)['city'].values[0])
pd.DataFrame({'metric':metric_list, 'highest':highest, 'lowest':lowest})
| metric | highest | lowest | |
|---|---|---|---|
| 0 | price_per_night | london | lisbon |
| 1 | room_shared | berlin | rome |
| 2 | bedrooms | amsterdam | barcelona |
| 3 | person_capacity | lisbon | amsterdam |
| 4 | host_is_superhost | athens | paris |
| 5 | multi | rome | paris |
| 6 | biz | lisbon | amsterdam |
| 7 | guest_satisfaction_overall | london | amsterdam |
| 8 | cleanliness_rating | london | rome |
| 9 | dist | berlin | athens |
| 10 | metro_dist | berlin | paris |
| 11 | attr_index_norm | london | athens |
| 12 | rest_index_norm | budapest | vienna |
fig = go.Figure()
for week_part in ['weekday', 'weekend']:
fig.add_trace(go.Box(y=df[df['week_part']==week_part]['price_per_night'],
x=df[df['week_part']==week_part]['city'],
#boxpoints=False,
name=week_part))# = px.box(df, x='city', y='realSum', color='week_part')
fig.update_layout(
boxmode='group',
font_color="black",
title_font_color="black",
title='Box plot of listing prices<sup><br>Listing prices on weekdays and weekends are similar, except in Amsterdam where prices are higher on weekends. </sup>',
plot_bgcolor='rgba(0, 0, 0, 0)',
yaxis=dict(title='price per night for two (euros)',
range=[0,800],
tickformat=',.0f'),
)
fig.show()
Here I computed the Pearson correlation coefficients* between all variables. The following variables to be positively correlated:
These correlations are statistically significant, as shown in p-value matrix.
Nothing surprising here.
* Pearson correlation coefficient is the ratio between the covariance of the two variables and the product of their standard deviations. So it is essentially a normalized measurement of covariance.
vars_of_interest = ['price_per_night','room_shared','bedrooms','person_capacity',
'host_is_superhost', 'multi', 'biz',
'guest_satisfaction_overall','cleanliness_rating',
'dist', 'metro_dist', 'attr_index_norm', 'rest_index_norm']
df_corr = df[vars_of_interest].corr().round(2)
mask = np.triu(np.ones_like(df_corr, dtype=bool))
df_corr_vis = df_corr.mask(mask)
fig = px.imshow(df_corr_vis, text_auto=True,
color_continuous_scale=px.colors.diverging.PRGn,
color_continuous_midpoint=0)
fig.update_layout(plot_bgcolor='rgba(0, 0, 0, 0)',
title='Pearson correlation coefficients',
xaxis=dict(tickangle=90),
height=600
)
fig.show()
df_pvals = pd.DataFrame(index=vars_of_interest)
for col_metric in vars_of_interest:
pvals_list = []
for row_metric in vars_of_interest:
pvals_list.append(pearsonr(df[col_metric],df[row_metric])[1].round(2))
df_pvals[col_metric] = pvals_list
df_pvals_vis = df_pvals.mask(mask)
fig = px.imshow(df_pvals_vis, text_auto=True,
color_continuous_scale=px.colors.sequential.Purples_r,
color_continuous_midpoint=0.1
)
fig.update_layout(plot_bgcolor='rgba(0, 0, 0, 0)',
title='''p-values of the Pearson correlation coefficients
<br><sup>(zeros mean that the correlations is so statistically significant that the p-value is smaller than the smallest possible floating point)</sup>''',
xaxis=dict(tickangle=90),
height=600
)
fig.show()
Scatterplots are a quick way of displaying relationship between two variables. To explore whether prices in different cities might be affected by different variables, I created a scatterplot per city and found that:
fig = px.scatter(df,
x='dist',
y='price_per_night',
hover_data=['price_per_night', 'dist'],
color='city',
facet_col='city',
facet_col_wrap=5,
title='Distance from centre avs price',
trendline='ols',
height=600, width=1000
)
update_annotations_and_layout(fig)
fig.show()
fig = px.scatter(df,
x='metro_dist',
y='price_per_night',
hover_data=['price_per_night', 'metro_dist'],
color='city',
facet_col='city',
facet_col_wrap=5,
title='Distance from metro vs price',
trendline='ols',
height=600, width=1000
)
update_annotations_and_layout(fig)
fig.show()
fig = px.scatter(df,#[df['person_capacity']==2],
x='guest_satisfaction_overall',
y='price_per_night',
hover_data=['price_per_night', 'guest_satisfaction_overall'],
color='city',
facet_col='city',
facet_col_wrap=5,
title='Guest satisfaction vs price',
trendline='ols',
height=600, width=1000
)
update_annotations_and_layout(fig)
fig.show()
Here I experimented with two different approaches to predict listing prices, linear regression and decision tree regression. Cross validation is used to identify the best model in each approach. Linear regression performed better than decision tree method, suggesting that the relationship between prices and the explanatory variables are more linear than nonlinear. It is also possible that the decision tree might be overfitting the data, capturing noise and irrelevant patterns instead of the true underlying relationship between variables. Decision trees are more prone to overfitting than linear regression model.
# create dummy variables for cities
city_dummies = pd.get_dummies(df['city'], prefix='city')
# create dummy variables for room type
room_type_dummies = pd.get_dummies(df['room_type'], prefix='room_type')
# merge city and room type dummies with the data
df_all_features = pd.concat([df, city_dummies, room_type_dummies], axis=1)
# transform binary categorical variables into dummy variables
for col in ['room_shared', 'host_is_superhost']:
df_all_features[col] = 1*df_all_features[col]
# select the relevant variables and load into x and y
X = df_all_features[['person_capacity', 'bedrooms', 'host_is_superhost', 'biz',
'cleanliness_rating', 'guest_satisfaction_overall',
'dist', 'metro_dist', 'attr_index_norm', 'rest_index_norm', 'is_weekend',
]+city_dummies.columns.tolist()+room_type_dummies.columns.tolist()].drop(columns=['city_lisbon', 'room_type_Shared room'])
y = np.array(df_all_features['price_per_night']).astype(float)
Here I use StandardScaler() to standardize features to have a mean of zero and a standard deviation of one. This is because I am going to use recursive feature elimination (RFE) for feature selection. RFE ranks features based on the absolute values of the coefficients, therefore it is important that the features are on the same scale.
from sklearn import preprocessing
standard_scaler = preprocessing.StandardScaler()
X = pd.DataFrame(standard_scaler.fit_transform(X), columns=X.columns)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
with hyperparameter tuning using grid search cross-validation, evaluated by R-squared (coefficient of determination).
In the script below I will use grid search to tune the number of features to include in the model. The best parameter is identified by cross validation.
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score
from sklearn.feature_selection import RFE
from sklearn.metrics import mean_squared_error, r2_score
# create an instance of the linear regression model that uses RFE (recursive feature elimination)
lr = LinearRegression()
rfelr = RFE(lr)
# specify the number of features as the hyperparameter to tune
param_grid = [{'n_features_to_select':range(1,len(X.columns))}
]
# create an instance of the grid search object, specifying a 5-fold cross validation
grid_search = GridSearchCV(estimator=rfelr, param_grid=param_grid, scoring='r2', cv=5)
# fit the grid search object on the training data
import time
start_time = time.time()
grid_search.fit(X_train, y_train)
print(f'{(time.time() - start_time):.0f} seconds' )
print('Best hyperparameters: ', grid_search.best_params_)
n_features_to_select and test performance¶# fit the model on the training set using the best hyperparameters
lr_best = RFE(LinearRegression(), **grid_search.best_params_)
lr_best = lr_best.fit(X_train, y_train)
# test performance
y_pred = lr_best.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f'Mean squared error: {mse:.2f}')
print(f'R-squared: {r2:.2f}')
# perform cross-validation to get a more robust estimate of the model performance
cv_scores = cross_val_score(lr_best, X, y, cv=5, scoring='r2')
print(f'Cross-validation R-squared: {["%.2f" % scr for scr in cv_scores]}' )
print(f'Mean cross-validation R-squared: {cv_scores.mean():.2f}', )
cv_scores = cross_val_score(lr_best, X, y, cv=5, scoring='neg_mean_squared_error')
print(f'Cross-validation neg MSE: {["%.0f" % scr for scr in cv_scores]}')
print(f'Mean cross-validation neg MSE: {cv_scores.mean():.0f}')
An alternative to linear regression is decision tree regression. Here I will be replicating the above process but instead of the number of features, I will be tuning the maximum tree depth instead. Tree depth is the number of splits a tree can make before reaching a decision.
from sklearn.tree import DecisionTreeRegressor
# create an instance of the decision tree regressor
dcr = DecisionTreeRegressor()
# specify maximum depth as the hyperparameter to tune
param_grid = [{'max_depth':range(1,8)}]
# create an instance of grid search object for the decision tree regressor
grid_search_dcr = GridSearchCV(estimator=dcr, scoring='r2', param_grid=param_grid, cv=5)
start_time = time.time()
# fit the grid search object on the training data
grid_search_dcr.fit(X_train, y_train)
print(f'{(time.time() - start_time):.0f} seconds' )
print('Best hyperparameters: ', grid_search_dcr.best_params_)
max_depth and test performance¶# fit the model on the training set using the best hyperparameters
dcr_best = DecisionTreeRegressor(**grid_search_dcr.best_params_)
dcr_best.fit(X_train, y_train)
# test performance
y_pred_dcr = dcr_best.predict(X_test)
mse = mean_squared_error(y_test, y_pred_dcr)
r2 = r2_score(y_test, y_pred_dcr)
print('Mean squared error: ', mse)
print('R-squared: ', r2)
# perform cross-validation to get a more robust estimate of the model performance
cv_scores = cross_val_score(dcr_best, X, y, cv=5, scoring='r2')
print(f'Cross-validation R-squared: {["%.2f" % scr for scr in cv_scores]}' )
print(f'Mean cross-validation R-squared: {cv_scores.mean():.2f}', )
cv_scores = cross_val_score(dcr_best, X, y, cv=5, scoring='neg_mean_squared_error')
print(f'Cross-validation neg MSE: {["%.0f" % scr for scr in cv_scores]}')
print(f'Mean cross-validation neg MSE: {cv_scores.mean():.0f}')
RFE picked 17 out of 22 input features for the linear regression model, all 17 features were statistically significant (p-value < 0.05). Here are some interesting observations:
Using the statsmodel library to retrieve the p-values of each included feature
The decision tree regressor assigned a feature importance score of 0 to all but 6 features had importance score of zero.
pd.DataFrame({'feature_name':lr_best.get_feature_names_out(), 'feature_coef': lr_best.estimator_.coef_}).sort_values('feature_coef', ascending=False)
| feature_name | feature_coef | |
|---|---|---|
| 15 | room_type_Entire home/apt | 47.581816 |
| 6 | attr_index_norm | 33.434194 |
| 7 | city_amsterdam | 32.171210 |
| 1 | bedrooms | 27.833197 |
| 16 | room_type_Private room | 19.500774 |
| 12 | city_london | 18.830633 |
| 13 | city_paris | 18.778153 |
| 0 | person_capacity | 15.552854 |
| 9 | city_barcelona | 9.085896 |
| 2 | biz | 6.633180 |
| 3 | cleanliness_rating | 2.902304 |
| 4 | guest_satisfaction_overall | 2.836171 |
| 10 | city_berlin | 2.340874 |
| 5 | metro_dist | -3.746838 |
| 14 | city_rome | -7.344834 |
| 11 | city_budapest | -14.770777 |
| 8 | city_athens | -17.213019 |
[feat for feat in lr_best.feature_names_in_ if feat not in lr_best.get_feature_names_out()]
['host_is_superhost', 'dist', 'rest_index_norm', 'is_weekend', 'city_vienna']
Using statsmodel library to get p-values of the feature coefficients
x = df_all_features[lr_best.get_feature_names_out().tolist()]
x = sm.add_constant(x)
model = sm.OLS(y, x).fit()
model.summary()
| Dep. Variable: | y | R-squared: | 0.241 |
|---|---|---|---|
| Model: | OLS | Adj. R-squared: | 0.241 |
| Method: | Least Squares | F-statistic: | 967.5 |
| Date: | Fri, 17 Mar 2023 | Prob (F-statistic): | 0.00 |
| Time: | 09:29:26 | Log-Likelihood: | -3.2992e+05 |
| No. Observations: | 51707 | AIC: | 6.599e+05 |
| Df Residuals: | 51689 | BIC: | 6.600e+05 |
| Df Model: | 17 | ||
| Covariance Type: | nonrobust |
| coef | std err | t | P>|t| | [0.025 | 0.975] | |
|---|---|---|---|---|---|---|
| const | -147.9687 | 10.621 | -13.931 | 0.000 | -168.787 | -127.151 |
| person_capacity | 12.0946 | 0.691 | 17.494 | 0.000 | 10.739 | 13.450 |
| bedrooms | 43.9366 | 1.251 | 35.123 | 0.000 | 41.485 | 46.388 |
| biz | 14.0294 | 1.411 | 9.940 | 0.000 | 11.263 | 16.796 |
| cleanliness_rating | 3.1170 | 0.947 | 3.293 | 0.001 | 1.262 | 4.972 |
| guest_satisfaction_overall | 0.3429 | 0.103 | 3.336 | 0.001 | 0.141 | 0.544 |
| metro_dist | -4.6670 | 0.812 | -5.748 | 0.000 | -6.258 | -3.076 |
| attr_index_norm | 3.2885 | 0.081 | 40.547 | 0.000 | 3.130 | 3.447 |
| city_amsterdam | 165.1652 | 3.610 | 45.755 | 0.000 | 158.090 | 172.240 |
| city_athens | -57.2870 | 2.507 | -22.852 | 0.000 | -62.201 | -52.373 |
| city_barcelona | 41.1882 | 3.272 | 12.588 | 0.000 | 34.775 | 47.601 |
| city_berlin | 12.0632 | 3.418 | 3.529 | 0.000 | 5.363 | 18.763 |
| city_budapest | -55.3180 | 2.752 | -20.102 | 0.000 | -60.712 | -49.924 |
| city_london | 49.3805 | 2.434 | 20.289 | 0.000 | 44.610 | 54.151 |
| city_paris | 57.1948 | 2.488 | 22.993 | 0.000 | 52.319 | 62.070 |
| city_rome | -18.9585 | 2.153 | -8.806 | 0.000 | -23.178 | -14.739 |
| room_type_Entire home/apt | 100.5021 | 7.560 | 13.293 | 0.000 | 85.684 | 115.320 |
| room_type_Private room | 42.6894 | 7.641 | 5.587 | 0.000 | 27.713 | 57.666 |
| Omnibus: | 134639.178 | Durbin-Watson: | 1.908 |
|---|---|---|---|
| Prob(Omnibus): | 0.000 | Jarque-Bera (JB): | 4185074797.675 |
| Skew: | 30.258 | Prob(JB): | 0.00 |
| Kurtosis: | 1395.428 | Cond. No. | 2.09e+03 |
pd.DataFrame({'feature_name':dcr_best.feature_names_in_, 'feature_importance': dcr_best.feature_importances_}).sort_values('feature_importance', ascending=False)
| feature_name | feature_importance | |
|---|---|---|
| 1 | bedrooms | 0.404788 |
| 8 | attr_index_norm | 0.388031 |
| 11 | city_amsterdam | 0.093544 |
| 0 | person_capacity | 0.054500 |
| 20 | room_type_Entire home/apt | 0.041960 |
| 15 | city_budapest | 0.017177 |
| 4 | cleanliness_rating | 0.000000 |
| 5 | guest_satisfaction_overall | 0.000000 |
| 19 | city_vienna | 0.000000 |
| 18 | city_rome | 0.000000 |
| 17 | city_paris | 0.000000 |
| 16 | city_london | 0.000000 |
| 14 | city_berlin | 0.000000 |
| 13 | city_barcelona | 0.000000 |
| 12 | city_athens | 0.000000 |
| 2 | host_is_superhost | 0.000000 |
| 10 | is_weekend | 0.000000 |
| 9 | rest_index_norm | 0.000000 |
| 3 | biz | 0.000000 |
| 7 | metro_dist | 0.000000 |
| 6 | dist | 0.000000 |
| 21 | room_type_Private room | 0.000000 |